A FIRST-ORDER AUGMENTED LAGRANGIAN METHOD 
FOR COMPRESSED SENSING 
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Abstract. We propose a first-order augmented Lagrangian algorithm (FAL) for solving the basis pursuit problem. FAL 
computes a solution to this problem by inexactly solving a sequence of l\ -regularized least squares sub-problems. These 
sub-problems are solved using an infinite memory proximal gradient algorithm wherein each update reduces to "shrinkage" 
or constrained "shrinkage" . We show that FAL converges to an optimal solution of the basis pursuit problem whenever the 
solution is unique, which is the case with very high probability for compressed sensing problems. We construct a parameter 
sequence such that the corresponding FAL iterates are e-feasible and e-optimal for all e > within O (log (e — x )) FAL iterations. 
Moreover, FAL requires at most 0(t~ 1 ) matrix-vector multiplications of the form Ax or A T y to compute an e-feasible, e-optimal 
solution. We show that FAL can be easily extended to solve the basis pursuit dcnoising problem when there is a non-trivial 
level of noise on the measurements. We report the results of numerical experiments comparing FAL with the state-of-the-art 
solvers for both noisy and noiseless compressed sensing problems. A striking property of FAL that we observed in the numerical 
experiments with randomly generated instances when there is no measurement noise was that FAL always correctly identifies 
^4 the support of the target signal without any thresholding or post-processing, for moderately small error tolerance values. 

psi 1. Introduction. In this paper we propose a new first-order augmented Lagrangian algorithm to solve 

the basis pursuit problem 

CJ min ||a;||i subject to Ax — b, (1-1) 
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where ^-norm ||a;||i := X)™=i \ x i\i x i denotes the i-th component of x £ K™, b £ M m , A £ IR mxn , with 
to <C n, and rank(A) = m, i.e. A has full row rank. The basis pursuit problem appears in the context 
of compressed sensing (CS) [HI [HI [HI H] where the goal is to recover a sparse signal x* from a small set of 
linear measurements or transform values b — Ax*. Candes, Romberg and Tao [6l[8l[9] and Donoho [12] have 
shown that when the target signal a;* is s-sparse, i.e. only s of the n components are non-zero, and the 

_ ^ measurement matrix A £ sft mx ™ satisfies some regularity conditions, the sparse signal x* can be recovered 

by solving the basis pursuit problem (1.1 ) with high probability provided that the number of measurements 

fSJ to = 0(slog(n)). The basis pursuit problem is a linear program (LP). Therefore, computing the sparsest 

solution to the set of linear equations Ax = b, which is an NP-hard problem for general A, can be done 
efficiently, in theory, by solving an LP. 

ly-! However, in typical CS applications the signal dimension n is large, e.g. n « 10 6 , and the LP (1.1) is 

often ill-conditioned. Consequently, general purpose simplex-based LP solvers are unable to solve the LP. 
Moreover, the constraint matrix A is typically dense. Therefore, general purpose interior point methods that 
require factorization of A T A are not practical for solving LPs arising in CS applications. 



On the other hand, in CS applications the A, although dense, still has a lot of structure. In many applica- 
tions, A is a partial transform matrix, e.g. partial discrete cosine transform (DCT), a partial wavelet, or a 
partial pseudo-polar Fourier matrix. Therefore, the matrix-vector product Ax and A T y can be computed in 
O(nlog(n)) time using either the Fast Fourier Transform (FFT) or forward and backward Wavelet trans- 
forms. This fact has been recently exploited by a number of first- order algorithms. In this paper, we propose 
a new first-order augmented Lagrangian algorithm for the basis pursuit problem. Since the basic steps in 
a first-order algorithm are the matrix-vector multiplications in the form of Ax and A T y, we will report 
complexity in terms of the number of such matrix- vector multiplications required to solve the problem. 

1.1. Previous work on first-order algorithms for compressed sensing. When the measurement 
data b contains a non-trivial level of noise, one can solve 

min A||*||i + ||Aa;-6||2, (1.2) 

a;£R" 
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for an appropriately chosen A > depending on the noise level to recover the sparse target signal x* with 
some error proportional to the noise on b [7j- On the other hand, when there is no noise on the measurements, 
b, or when the noise level is low, one can solve ( 1.2 ) for a fixed small A > 0, which can be viewed as a penalty 



approximation to (1.1). 



In [TS] Figueiredo, Nowak and Wright proposed the GPSR algorithm that uses gradient projection method 
with Barzilai-Borwein steps to solve (1.2). Hale, Yin and Zhang [TBIE] proposed to solve (1.2) via the fixed 
point continuation (FPC) algorithm that embeds the soft-thresholding (1ST) algorithm 10J in a continuation 
scheme on A, i.e. FPC begins with A > A and gradually decreases it to A, to recover the sparse solution of 



(1.2). Wen, Yin, Goldfarb and Zhang J33] improved the performance of FPC by adding an active set (AS) 



step. Please note that GPSR, FPC and FPC- AS only converge to the optimal solution of (1.2), not to the 



optimal solution of (1.1). Hence, when there is no noise or when it is low, the solutions produced by these 



algorithms are only good approximations to x*. 

Yin, Osher, Goldfarb and Darbon [25] solve ( |1.1[ ) using a Bregman iterative regularization scheme that 
involves a sequence of problems of the form min^gRn A||cc||i + ^||^4a; — &^|||, where b^ k ' are obtained by 
suitably updating the measurement vector 6, and each sub-problem is solved using FPC. For the basis pursuit 
problem, the so-called Bregman iterative regularization procedure is nothing but the classic augmented 
Lagrangian method. The algorithm YALL1 developed by Yang and Zhang [21], which is an alternating 



direction algorithm, is able to solve the basis pursuit problem (1.1), the penalty formulation (1.2), and the 
basis pursuit denoising problem 



.t. \\Ax-b\\ 2 <6. 



(1.3) 



Bregman iteration based methods [25] and YALL1 [23] provably converge to the optimal solution of the basis 



pursuit problem (1.1); however, their convergence rates are unknown. 



Other algorithms for ^-regularized least squares problem (1.2) include an iterative interior-point solver 18 , 
and an accelerated projected gradient method [IT]. Van den Berg and Friedlander [52] proposed SPGL1 



to solve the penalty formulation (1.3) by solving a sequence of LASSO sub-problems *&(£) = {\\Ax — b\\ 



\\x\\i < t} where parameter t is updated by a Newton step. This algorithm provably converges to the optimal 
solution of (|1.3|; however, the convergence rate is again unknown. 



Aybat and Iyengar [5] have proposed a first-order smoothed penalty algorithm (SPA) to solve the basis 
pursuit problem. SPA iterates {x^ k '}k^z + ar e computed by inexactly solving a sequence of smoothed penalty 
problems of the form 



min |A (fe) p 

;|| 2 <7jC*0 L 



«(*) + /«(*)}, 



,( fe ), 



f (fe) 



where p£'(x) is a smooth approximation of || x ||i, fv^'{x) is a smooth approximation of || Ax — b\\ 2 and ?/ fe ) is 
a suitably chosen bound on the £2-norm of an optimal solution of the fc-th sub-problem. SPA calls Nesterov's 
optimal algorithm for simple sets [HI [20] to solve the sub-problems. SPA iterates provably converge to an 
optimal solution x* of the basis pursuit problem whenever it is unique. Moreover, for all small enough e, SPA 
requires 0{^ne~^) matrix-vector multiplies to compute an e-feasible, i.e. ||Acw — b\\ 2 < e, and e-optimal, 
||a;( fc )||i — ||a;*||i| < e iterate. 

Becker, Bobin and Candes [4J have proposed NESTA for solving the formulation ( |1.3[ ) (NESTA can also be 
used to solve the basis pursuit problem ( |1.1[ ) by setting 5 to 0). NESTA calls Nesterov's optimal gradient 
method for non-smooth convex functions 20J to solve the sub-problems. When the matrix A is orthogonal, 
i.e. AA T ■- I, NESTA requires 0(->Jne~ l ) matrix-vector multiplications to compute a feasible e-optimal 



iterate to (1.3). When the matrix A is a partial transform matrix, i.e. Ax and A T y is 0(n\og(n)), but A is 
not orthogonal, NESTA, in general, needs to compute (A T A + /il) _1 , and therefore, its C(n 3 ) per iteration 
complexity is quite prohibitive for practical applications. Moreover, the sequence of NESTA iterates does 



not converge an optimal solution of (1.3) but to a solution of a smooth approximation of (1.3) 
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1.2. New results. In this paper we propose a first-order augmented Lagrangian (FAL) algorithm that 
solves the basis pursuit problem by inexactly solving a sequence of optimization problems of the form 

min \ X^WxWx ~ \ (k) {9 {k) ) T (Ax -b) + l\\ Ax -bWt], (1.4) 

xGR": \\x\\ 1 <ri l . k ) L 2 J 

for an appropriately chosen sequence {(A^, 0^ k ' ,r]^)}keZ + - Each of these sub-problems are solved using 



a variant (see Figure 2.1| of the infinite-memory proximal gradient algorithm in [3T] (see, also FISTA [3] 



and Nesterov infinite- memory algorithm [20). Each update in this proximal gradient algorithm involves 
computing the gradient A T (Ax — b) of the quadratic term ^||Aaj — 6||| and computing two constrained 



"shrinkage" (see Equation A. 10), which require 0(n\og(n)) work. Hence, the complexity of each update is 
dominated by computing the gradient A 1 (Ax — b) or equivalently two matrix- vector multiplies. 

In Theorem |3.1| in Section [3] we prove that every limit point of the FAL iterate sequence is an optimal 
solution of (|1.1| . Thus, the FAL iterates converge to the optimal solution when the solution is unique. In 
Theorem 3.4 we show that for all e > 0, the FAL iterates x^ k ' are e-feasible, i.e. ||^4x^ fe ^ — b\\2 < e, and 



e-optimal, I ||x( fe )||i — ||x*||i| < e, for k > 0(log(e 1 )). Moreover, FAL requires at most 0(ne 1 ) matrix- 



vector multiplications to compute an e-feasible, e-optimal solution to (1.1). Thus, the overall complexity of 
FAL computing an e-feasible and e-optimal iterate is 0{r? log(n)e~ 1 ) in the CS context. And in Sectional 
we briefly discuss how to extend FAL to solve the noisy recovery problem min,,; e Rn{||2;||i : \\Ax — &|| 2 < 8}. 

In Section [6] we report the results of our experiments with FAL. We tested FAL on randomly generated 
problems both with and without measurement noise and also on known hard instances of the CS problems. 
We compared the performance of FAL with SPA [2], NESTA [4], FPC [17], FPC-AS [23], YALL1 [24] and 
SPGL1 [52]. On randomly generated problem instances FAL is at least two times faster than all the other 
solvers. On known hard CS instances the run times of FAL were of the same order of magnitude as the best 
solver; but FAL was able to identify significantly sparser solutions. We also observed that for all randomly 
generated instances with no measurement noise FAL always correctly identified the support of the target 
signal x* , without any additional heuristic thresholding, when the error tolerance was set to moderate values. 
Once the support is known, the signal x* can often be very accurately computed by solving a set of linear 



equations. Moreover, although the bound in Theorem 3.4 implies that FAL requires O (-) matrix-vector 
multiplies to compute an e-feasible, e-optimal solution, m practice we observed that FAL required only 
O (log (^)) matrix- vector multiplies to compute an e-feasible, e-optimal solution. 

FAL is superior to SPA [2] both in terms of the theoretical guarantees as well as practical performance on the 
basis pursuit problem. However, FAL explicitly uses the structure of the i^-norm and is, therefore, restricted 
to basis pursuit and related problems. On the other hand, SPA can be extended easily to solve the following 
much larger class non-smooth convex optimization problems: 

min max ue u{(f)(x,u)}, 
subject to \\Ax — 6|| 7 < 5, 

where U is a compact convex set and </> : K™ x U — > K is a bi-affine function [20], and 7 S {1, 2, 00}. This class 
includes as special cases, basis pursuit, matrix games with side constraints, group LASSO, and problems of 
the form min{^^ =1 ||Sfcx||i : Ax = b} that appears in the context of reconstructing a piecewise flat sparse 
image. 

2. Preliminaries. In this section we state and briefly discuss the details of a particular variant of 
Tseng's Algorithm 3 in [21] that we use in FAL. Algorithm 3 [21] computes e-optimal solutions for the 
optimization problem 

mmp{x) + f(x), (2.1) 

x£F 



where /, p and F satisfy the following conditions. 

s (lsc) and convex function, and domp closed, 

(2.2) 



(i) p : W 1 — > R proper, lower-semicontinuous (lsc) and convex function, and domp closed, 

(ii) / : M. n — > M. proper, lsc, convex function, differentiable on an open set containing domp, 

(iii) V/ is Lipschitz continuous on domp with constant L, 

(iv) F n axgmin a . gR «{p(a:) + f(x)} £ 0. 



We refer to a function h : K.™ ->Rasa prox function if h is differentiable and strongly convex function with 
convexity parameter c > 0, i.e. h(y) > h(x) + Vh(x) T (y ~ x) + |||y — a;||| for all x,y € dom/i. 

Algorithm APG(p, /, L, F, x^\h, APGstop) 



u (0) ^_ x (o) ; w (o) <_ argni in 2:edomp /i(.T), #°) <- 1, £«- 
while (APGstop is false) do 
W M <- (1 _ 0W) tt W + tfWtuM 

u,(i+D <_ argmin {^to ^T (p(*) + V/(«W) T z) + £ ft(«) : z g f} 

ffW(a!) :=p(x)+V/(wW) T a;+f||a;-«W||2 



</ 



(*+i) 



«— argmin{if ^ (x) : ic G F} 



0(t+l) <_ V(tfW)*-4(tfW)»-(tfW) 2 

£ <- £ + 1 
end while 
return u" or u" depending on APGstop 



Fig. 2.1. Accelerated Proximal Gradient Algorithm 



Our variant of Algorithm 3 in [ST] is displayed in Figure 2.1 Algorithm APG takes as input the functions 
/ and p, a prox function h, the set F, an initial iterate x^ and a stopping criterion APGstop. 



Lemma 2.1. Suppose p, f and F satisfy (2.2). Let h be a prox function on an open set containing domp and 
m i n x£domph(x) — 0. Fix e > and let {u^ , v ^ , w *•' }^gz be the sequence generated by Algorithm APG 

displayed in Figure 2.1 Then p(u ( - e + 1 ^) + /(w (£+1) ) < min xeR n {p(x) + f(x)} + e for all t > J^ h(x*) - 1, 
where x* G argrnin a . eR „ {p(x) + f(x)}. 

Proof. Corollary 3 in (2JJ implies this result provided that 7J(u( £+1 ') < H(u^ e+1 ^) holds for all I > 0. Using 
induction, it is easy to show that this is true when the update rule ffh is used for all (. > 1. D 

3. Convergence Properties of FAL. In this section, we describe FAL and prove the main convergence 
results for the algorithm. The outline of FAL is given in Figure |3.1| Algorithm FAL takes as inputs a 
sequence of {(A' fe ', e^ k \ r^)}fegz + , a starting point x^ and a bound n on the £i-norm of an optimal solution 
x* of the basis pursuit problem. One such bound r\ can be computed as follows. Let x = argmin{||a;||2 : 
Ax = b} = A T (AA T )~ 1 b. Clearly, ||x*||i <n:= \\x\\\. We will next describe each of the steps in this outline. 



An augmented Lagrangian function for the basis pursuit problem (1.1) can be written as 

P(x) := X\\x\\i - X6 T (Ax -b) + h\Ax- 6|||, 

where A is the penalty parameter and 9 is a dual variable for the constraints Ax = b. From Lines [4][7] in 
Figure |3.1| it follows that in the fc-th iteration of Algorithm FAL we inexactly minimize the augmented 
Lagrangian function 

p^ k \x)^\^\\x\\ 1 + ^\\Ax-b-\^e^\\l = \^\\x\\ 1 -\^\e^) T 

over the set F^ :— {x : \\x\\i < r/^} using Algorithm APG with the prox function h^ — |||x — a;^ fe_1 ^|||. 

1 



Algorithm FAL({{X^ k \e^\T ( ^)} keZ+ ,x^\r]) 
l: 0W =0,k^0,L^- d max (AA T ) 



while (FALstop is false) do 

k <- fc + 1 

p(*)(ic) :=AW||ar||i, / (fc) (^) := |||Ab- 6- AW^W||| 



2 
3 
1 
5: 
6: nW<-TI+*¥-\\0W\\$ 



~2~»" 112 

7: FW :={xE W l : \\x\\i < f]^} 

8: ^2x^ ( r max (A)(r ? ( fe ) + ||.T( fc - l )|| 2 ) 1 

9: APGstop := {£ > l { £l x } or {3g G SP( fc )(aj)|„ w with ||.g|| 2 < t«} 

io: x^ <- APG^Wj/W^^W^^- 1 ),^), APGstop] 

11: ^ fe+1 > f- 0« - ^^ 

12 
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end while 
return x so i ■(— jc^ 



Fig. 3.1. Outline of First- Order Augmented Lagrangian Algorithm (FAL) 

Recall that when using Algorithm APG we need to ensure that F^ n argmin 2 . eR „{p(' c: )(a;)} ^ 0. Let 
(fe) G argmin xeK „P( fe )(a;). Since P( fc )(xl fc) ) < P (&) («*), Ax* = b and ||a;*||i < rj, we have ||xl fc) ||i < 77W. 



:r 



Thus, xl fc) G P<». 

Next, we discuss the stopping cri terio n set in Line [9] in Figure [XT) The convexity parameter of the prox- 
function h^ is 1. Hence, Lemma 



2.1 



establishes that 



!*)U f W for />> J— \\*W - xV'-Vh, 



p(*)( u W)<p(*)( x W) + e C*) for £> A /4^|b 

where {u^- , }^gz + is the sequence of u-iterates when Algorithm APG is applied to the fc-th subproblem 

and L denotes the Lipschitz constant of the gradient Vf^ k '(x). Since ||ir; ||i < r]^ k ' ancl II -111 > IMI2, triangle 
inequality implies that 

Hsi*) _ S (*-D|| a < ll^ll, + ||a:(*-i)|| 2 < „W + Hs^-^Ha. (3.1) 

Since V/ (fc) (x) = A T (Ax - b - A (fc) (fe) ) it follows that L = <r^ ax (A), where tr max (A) denote the largest 
singular value of A. Thus, it follows that 

y|| H^) _ X (*-D|| a < a max (A)(r^ + II^IW/J =: 41- 

Consequently, it follows that the stopping criterion APGstop in Line [9J ensures that the iterate x^ satisfies 
one of the following two conditions 

(a) P( fe )(arW)<min xeMn p( fc )(a;) + eW, . . 

(b) 3g G dPW(x)\ xW with ||p|| 2 < r« l ^ /j 

where 9P^ fe ^(x)| 3 .(fe) denotes the set of subgradients of the function p( fc ' at x^ k \ When £ > ^ax in 
APGstop holds, Algorithm APG returns u^\ otherwise, when 3g G dP {k \x)\ v{ f) with ||p|| 2 < r (fc) , 
Algorithm APG returns u^. And we set x^ to what Algorithm APG returns. 



In Line 11 we update the dual variables 6^ ' in a manner that is standard for augmented Lagrangian 



algorithms. This completes the description of Algorithm FAL displayed in Figure [3d 



In the result below we establish that every limit point of the FAL iterate sequence {x^}kez, is an optimal 
solution of the basis pursuit problem. 

Theorem 3.1. Fixx^ £ E" , r\ > such that n > \\x*\\i and a sequence of parameters {(X ( - k \e^ k \T ( - k ^)} ke z + 
such that 

(i) penalty parameters, A' ' \ 0, 
(ii) approximate optimality parameters, e' ' \ such that ,. e (fc) , 2 < B\ for all k > 1, 

i \ ( k) ( k) 

(Hi) subgradient tolerance parameters, r*- ' \ such that jrgy < B 2 for all k > 1, and jjjtj — > as k — > 0. 

Let X — {x( k >}k£Z + denote the iterates computed by Algorithm FAL for this set of parameters. Then, X 
is a bounded sequence and any limit point x of X is an optimal solution of the basis pursuit problem (1.1 ). 

Proof. Since Imlx is finite for all fc > 1, it follows that the sequence X exists. 

As a first step towards establishing that X is bounded, we establish a uniform bound on the sequence of dual 
multipliers {9^ k '}k^z + - Suppose in the fc-th FAL iteration Algorithm APG terminates with the iterate 
x^ satisfying Q(a). Then Corollary [A~2l applied to P^(x) = A( fc >||x||i + \\Ax - b- \^6^\\ 2 2 guarantees 
that 

\\A T (Ax^ - b - AWflWjIU < V^ a max (A) + A«. 

Instead, if the iterate x^ satisfies ([3^2|(b), i.e. there exists gW £ dllzHiUfc) such that \\X^>q < ^>+A T {Ax { ^- 
&_A( fe )6K fe ))|| 2 <t^\ then 

\\A T (Ax w -b- \ W 6 W )\\„, <r ik) +X (k) \\q (k) \\ oc <r (fc) +A (fe) , (3.3) 

where the second inequality follows from the fact that ||g( fc )||oo < 1 for ah <r k ' £ 9||a;||i| a .(fc). 
Since 6>W = 0, 6>( fe+1 ) = 8^ - Ax ^ b for all k > 1 and A has full row-rank, it follows that 

||^ +1) || 2 < m l (M \\A T (Ax^ b A< fc W*>)|| 2l 

Jri ( { / 2e( k ) r (fe) 1 \ 

- o^ua) [™*{«~ ^\j jmy^ wi) + 1 )> yk ^ l {3 - 4) 

fit) (fc) 

The bounds ,. £ (fc) , 2 < B\, and j-^j < B 2 , together with (3.4| imply that 



(A(<=>) 2 



/ll 



\\e {k) \\ 2 < B e := V (m a x{^/2B 1 a max (A), B 2 } + 1) , Vfc > 1. (3.5) 

cr min (A) V y 



From this bound, it follows that 



AW 1 

||z (fe) ||i < V W = V + ^\\0 k \\t < B x :=n+-\^Bl (3.6) 

Thus, A" is a bounded sequence and it has a limit point. Let x denote any limit point and let K. C Z + denote 
a subsequence such that limbic x( k > = x. 

Suppose that there exists a further sub-sequence K, a c /C such that for all k £ K. a calls to Algorithm APG 
terminates with an iterate x^ k ' satisfying (|3.2[)(a). Then, for k £ K a , we have that 



{k) fW(xW) P( fc )(xl fc) ) + £ ( fc ) fW( z ,)+fW 

I|X l|x " A( fe ) " A« " A( fe ) 
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where the first inequality follows from the fact f( k '(x) > 0, second follows from the stopping condition, the 
third follows from the fact that P( k \xi ) < P( k \x*), the equality follows from the fact that Ax* = b, and 
the last inequality follows from the bounds ||0^||2 < Bq and ,. £ (fc) , 2 < B\. Since A' fc ' — >• 0, taking the limit 
along K a we get 



| 1 ^hmJ|^)|| 1 <||^|| 1+fc hm{A«Q Be2 + Bl )} 



= IMIi- (3.8) 



Next, consider feasibility of the limit point x. 

\\Ax^ - b - A<*W fc >||! < pW^W) < pW(a;( fc )) + e (fc) < pW(i.) + e W 

< A^II^Hi + \\\\ (k) 0^\\l + £ « < A«||z*||i + \{\ {k) B e ) 2 + e (fc) , 

where the first inequality follows from the fact X^ \\x^ \\ > 0, the third follows from the fact that P^ (xi ) < 
P( k ^(x*), the fourth follows from the fact Ax* = b, and the last follows from the bound ||0^- ) ||2 < Be- Taking 
the limit along K a , we have 

\\\Ax-b\\l<0, 

i.e. Ax — b. Since x is feasible, and ||x||i < ||£*||i, it follows that x is an optimal solution for the basis 
pursuit problem (1.1). 

Now, consider the complement case, i.e. there exists K £ K such that for all k £ Kb := K n {k > K}, 
calls to Algorithm APG terminate with an iterate x^ that satisfies (3.21(b). For all k £ Kb, there exists 
<j (fe) £ 9||a;||i| x (fc) such that 

|| A ( fe V fe ) + A T (Ax^ - b - X^9^)\\ 2 < r«. (3.9) 

Then, for all k £ Kb, 

\\AxW b\\ 2 < -J—\\A T (AxW b)h, 

CT m in(-4) 

< —±—(\\A T {Ax<- k) - 6- A<*¥*>) + \ {k) q {k) h + l|A (fc V fc) ||2 + \\A T (\WeW)\\ 2 ), 

0min{A) \ J 



(Tmin(A) J 



r(A) 

where a m i n (A) denotes the smallest non-zero singular value of A. The first inequality follows from the 
definition of <T m - ln (A), the second inequality follows from triangle inequality, and last inequality follows from 
(3.9), the bound ||#( fe )||2 < Bg, and fact that ||g||2 < \fn for any q £ d\\x\\\\ x (h) . Taking the limit along Kb, 
we have \\Ax — b\\ 2 < 0, or equivalently Ax = b. 

For all k £ Kb, q^ £ d\\x\\i\ x w, therefore, ||<7 < - fc - > |] oo < 1- Hence, there exists a subsequence K' b C Kb such 
that limfegx;' q( k > = q exists. One can easily show that q £ 9||a;||i|x- Dividing both sides of (3.9) by \( k \ we 
get 

(k) 

h W-A T e^h< T W) , (3-10) 

for all k £ K' b . Since lim fce K;' q^ — q, linifc e z + ^y = and A has full row rank, it follows that {9^ : k £ K' b } 
is a Cauchy sequence; therefore, lim^gx;' 9^ k+1 > = exists. Taking the limit of both sides of ( |3.10[ ) along K' b , 
we have 

q = A T 0. (3.11) 

7 



(3.11) together with that the fact that Ax = b and q £ d\\x\\i, it follows that the KKT conditions for 



optimality is satisfied at x] thus, x is optimal for the basis pursuit problem. □ 

In compressed sensing exact recovery occurs only when min{||a;||i : Ax = b} has a unique solution. The 
following Corollary establishes that FAL converges to this solution. 

Corollary 3.2. Suppose the basis pursuit problem min{||a;||i : Ax = b} has a unique optimal solution x 
Let {x( k '}k£Z + denote the sequence of iterates generated by Algorithm FAL, displayed in Figure 
corresponding to a sequence {(A^ fe ', e^\ t^ ')}kez + that satisfies all the conditions in Theorem 
linifc^oo x (fc) = x*. 



3.1 



3.1 



The 



Next, we characterize the finite iteration performance of FAL. This analysis will lead to a convergence rate 
result in Theorem 13.41 



Theorem 3.3. Fixx^ £ l n , r\ > such that r\ > \\x*\\i and a sequence of parameters {(A^e^r^)}^^ 



satisfying all the conditions in Theorem 3.1 In addition, suppose for all k > 1. t^' < c e^ k ' for some 
< c < 1. Let {x( k >}k£z + denote the sequence of iterates generated by Algorithm FAL, displayed in 
Figure \3~7\ for this set of parameters. Then, for all k > 1, 

(i) ||Ar( fc )-&|| 2 <2B e \( k \ 
(ii) |||aj^||i - ||x*||i| < max 



(f +B 1 max{l, 2cB x }) S°^ +Be y \\ {k) > 



where B e = ^^ (m&x{V2B~[ a max (A), B 2 ) + l), and B x = r\ + \\ {1) B\ 



Proof. First note that we have established the uniform bounds ||0^||2 < B$ and ||x^ fe ^||i < B x in (3.5) and 



(3.6), respectively. 



The dual update in Line [TT] of Algorithm FAL implies that 

\\Ax {k) -b\\ 2 < \\Ax ( V _ b - X^e^\\ 2 + A (fc) ||0 (fe) || 2 = A (fe) ||6K fc+1 )|| 2 + X {k} \\6 {k) \\ 2 < 2B X {k \ 

where the last inequality follows the fact that ||# ( - fc ' , ||2 < Bg. This establishes (p}. 
The dual update in Line [TT] of Algorithm FAL also implies that 

pW(a;W) = X^Wx^h + ^\\A X W - b - \i k W>\\i = X^ (\\x^h + ^H|0 (fc+1) ||i) . 

Thus, for all k > 1, 

ll* W l|i>^^^-^ll» (fc+1) ll2. (3-12) 

Next, we establish a lower bound for P'"(ij ). Consider the following primal-dual pair of problems: 

min^gR,. ||x||i niax raeK ™ b T w (3.13) 

subject to Ax — b, subject to ||A T w|| oc < 1. 



Let w* £ R m denote an optimal solution of the maximization problem in (3.13). Next, consider the primal 



dual pair of problems corresponding to the penalty formulation for the basis pursuit problem: 

miiXj.gR A||x||i + !||.A#-&-A0||! max„, G R m X{b + X0) T w - ^\\w\\\ (3.14) 

subject to H^wjloo < 1. 



Since w* is feasible for the maximization problem in (3.14) is as well, it follows that 



P^(xl fe) ) = min {A<*>||sc||i + -\\Ax-b- A< fc W fc >||2} 



16R" 



>AWr||af*||i-AW||^|| a |K| 



\W 



2 > 



A (fe) 



2 - -y-|F*||2 j ■ 



(3.15) 
(3.16) 



where (3.15) follows from weak duality for primal-dual problems (3.14) and (3.16) follows from strong duality 
for primal-dual problems (3.13), i.e. b T w* — ||x*||i, and Cauchy-Schwartz inequality. Thus, (3.12) implies 
that 



A« 



lla^lla > \\x4x ~ A (fc) (||««|| 9 ||t«.|| a + \\\w.\\$ + \\\0 {k+l) \\i) > IMi ^^ ^ 

where the second inequality follows from the fact that || J 4 T w„|| 00 < 1. 

The final step in the proof is to establish the upper bound in (hi]). Suppose the iterate x^ k ' satisfies the 
stopping condition (3.2) (a). In (3.7) we show that 



lM fc) lli < IMi + ^fV fe) ll2 + ^ < IMi + A (fc) Qs| + B^j 



(3.17) 



where we have used the fact that ||# < - fe - ) ||2 < Bg and ,. e (fc) , 2 < B\ 



Next, suppose x^ satisfies ([^(b). Let gW e dP^(x^) such that ||g (fe) || 2 < r^h Then the convexity of 
pi k ) implies that 



p(*)(a;(*)) _ pW( Xt ) < -( fl W) T ( Sj - x {k) ) < \\g^\\ 2 \\x, - z (fc) || 2 < -r (fe) ||ar* - ^ (fc) || ; 



(3.18) 



Now, an argument similar to the one that establishes the bound (3.7), implies that 

\( k ) T ( k ) /I \ 

||xW||i < IMK + -^\\0 {k) \\l + ^yll** - ^ (fc) l| 2 <Wi + X W ^-Bl + 2cB x B 1 J , (3.19) 

where we have used the fact that ||0^||2 < Bg, t^ < ce^ k \ and ,. £ (fc) , 2 < B\. The upper bound in (fnl) 



follows from (3.171 and (3.19). D 



Next, we use the bounds in Theorem 3.3 to compute a bound on the convergence rate of Algorithm FAL. 



Theorem 3.4. Fix an < a < 1. Then there exists and one can construct a sequence of pa rame ters 
{(X^ k \e^ k \ T^ k ')}kez + such that the iterates generated by Algorithm FAL, displayed in Figure 



3.1 



e-feasible, i.e. \\Ax^ — £»|| 2 < e, and e-optimal, I ||x^ fe ^||i — ||#*||i| < e, for all k > -/VpAL(e) = O (logi (7) ) • 
Moreover, FAL requires 



„ /xn2 ^16lk*||i 1 9 , f8nK 2 (A)\\ /l 

N mat < 2nn{Af " * H1 ■ - + - ■ logi K —f- = O - 

\a{i — a) e a <* \ e // \e 



(3.20) 



matrix-vector multiplies to compute an e-feasible, e-optimal iterate. 

Proof. Rescale the problem parameters (A, b) = - — K^(A, b). Then for the rescaled problem L — a max (A) — 
1, but the condition number k(A) — k(A). We will use Algorithm FAL to solve the rescaled problem (A, b). 
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Set AW = 1, eW = 2, and update 



e (fe+i) 



a-A« 



2max{l,r;+^K(A) 2 } 

For this choice of problem parameters, the constants 

C e (fe) ] e (l) ( T {k) 

Therefore, the uniform bounds on ||0( fc '||2 and ||x^ fe ^||i are given by 



. e (fc). 



(3.21) 



2max{l,?7+^K(^) 2 } 



< 1. 



B e = ^ (max{x/2^ cr max (A), B 2 } + l) < 3«(i4)Vn, S x = r, + ^5 2 < r, + ^k(A) 2 . 

CTmin^) V ' 2 2 



For the rescaled problem (A, b), the Theorem 3.3 guarantees that for all k > 1, 



l^lll-ll^lll 



. M< if + 4^ +fl ') iv,' 



At 1 ) a fe -i < 8n K (yl) 2 a 



2„fc-l 



where we use the fact that k(A) > 1. Thus, |||x( fe )||i — ||a;*||i| < e, for all fc € Z + such that 

/8n K (A) 2 ^ 



fc > lni 



1. 



<* \ e 
From Theorem |3.3| we also have that for all k > 1, 

\\Ax {k) -b\\ 2 < 2B e A (1) u k ~ x < 6VnK,(A). 
Thus \\Ax^ -b\\ 2 <e for all 



(3.22) 



k > In 



i / 6y/fJn(A) \ | L 



(3.23) 



From(3.22) and (3.23) it follows that for all e > 0, ./Vfal(£)i the number of FAL iterations required to 
compute an e-feasible and e-optimal solution, is at most 



AWe) < 



In 



lnK(Af 



(3.24) 



From the stopping condition APGstop defined in Line [9] of Algorithm FAL, it follows that the total 
number of the Algorithm APG iterations, Napg, required during A'fal(e) many FAL iterations is bounded 

by 

AfFAL(e) 

^apg= £ [41 

fc=l 

Nfal(e) 



E 

fe=l 

AfFAL(e) 



a max (I)(r? (fe) + H^- 1 )^ 




< £ fa + X^BJ)^ 



< 



fe=i 

Ar PAL (e)-l 

2ry g a 

fe=0 

2m/ 
l-a 



5 2 



a 



iV FA L(e) 



£ 2 



■a-^^W + ^-ATFALCe), 



(3.25) 
(3.26) 
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where the first inequality follows from the fact that ||x(* _1 >||2 < ||^ (fc— 1} || 1 < ?/ fc_1) and ||6>( fe )|| 2 < B e , the 
third equality follows from substituting for the parameters A^ fe_1 ^ and e^ k \ and the last inequality follows 
from the summing the geometric series. 

ALGORITHM FAL calls Algorithm APG with a quadratic prox function of the form h(x) = |||a; — x||| 
and the smooth function of the form f(x) = |||Aa; — b — A#|||. In each iteration of Algorithm APG, we 
need to compute the gradient V/(x) = A T (Ax — b — X9) and solve two constrained shrinkage problems of 
the form 



mm 



| A IMIi + gll* -5 !!! : IMIi ^ ? /} 



We show in Lemma A.4 in Appendix A that the complexity of solving a constrained shrinkage problem is 
O(nlog(n)). Thus, the computational complexity of each Algorithm APG iteration is dominated by the 
complexity of computing A T (Ax — b — X6). The complexity result now follows from the bound in (3.26). 

D 

4. Extension of FAL to noisy recovery. In this section, we briefly discuss how FAL can be extended 



to solve the noisy signal recovery problem of the form (1.3). See pQ for the further extensions of the 



methodology proposed here. Consider a noisy recovery problem 

min ||a;||i, 
s.t. || Ax- 6|| 7 < 5, 

where 7 € {1,2, 00}. The formulation with 7 £ {1, 00} are interesting when the measurement noise has 
a Laplacian or Extreme Value distribution. By introducing a slack variable s G M. m , the noisy recovery 
problem can be formulated as follows: 

min Nli, (41) 

s.t. Ax + s = b, ||s|| 7 < 6, ' 



We solve (4.1 ) by inexactly minimizing a sequence of sub-problems of the form 

pW(x, s) = A (fc) ||x||i + -\\Ax + s-b- A (fc ¥ fc) ||| (4.2) 

over sets F^ ={(x,s) : ||x||i <?/ fc \|js|j 7 < S} using Algorithm APG with the prox function h^(x, s) := 
|||a; - x^-^Wl + HI* - s (fe_1) ||| and initial iterate (a**- 1 ),^*- 1 )), where r/ fe ) := 77 + ^||6K fc )||2. 

In order to efficiently solve ( |4.1[ ) we need a good stopping condition for terminating Algorithm APG. Since 

max{/i( fc )(a;, S ) : (x, s) € F (fc) } < (m W ) 2 := \ {v {k) + ll^ (fc_1) l|2) 2 + | (v(-y)S + Hs^-^Ha) , where v(j) = 1, 
when 7 = 1,2 and y/m when 7 = 00, Lemma \2 . 1 1 implies that terminating Algorithm APG at iteration 

, where tml x := <r max (A)^ (fc ) J -%$, guarantees that (x^, s^) is e( fc )-optimal for p( k \ 



t max 



Recall that in solving the basis pursuit problem using Algorithm FAL we terminate Algorithm APG 
when either we are guaranteed that the iterate x^ k ' is e( fe )-optimal for P^ or there exists a sub-gradient 
g( k ) g 9P( fe )(a;( fe )) with a sufficiently small norm. In our numerical experiments, we found that we always 
terminated the call to Algorithm APG using the sub-gradient stopping condition. In order to extend FAL 



to efficiently solve (4.1 ) we need an analog of the sub-gradient condition. 

In FAL we were able to set the tolerance t^ small since we are guaranteed that argmin, 1 . e]R ,.{P( fe ) (a;)} C p( fc ) . 
Let d x P( k >(x, s) denote the projection of the set of sub-gradients on the a:- variables. The definition of F^ k > 
guarantees that x* € F^ for all (xi , s* ) € argmin{P W (x, s) : x e M",||s|| 7 < S}. Therefore, we 
can continue to use the gradient condition g x £ d x P( k ' (x^ k \ s^) with \\g x \\2 < t x . However, since s is 
constrained, we cannot force ||.g s ||2 to be close to zero; therefore, we need an alternative gradient condition. 

Fix x^ k \ Define Q{s) — P^ k \x^ k \s) and Q — {s : ||s|| 7 < 6}. The function ((s) is differentiable and 
V£(s) is Lipschitz continuous. Let ttq(s) := mm y ^Q{\\y — (s — -^V^(s)) || 2} denote the projection of the 
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gradient step s — -^VC(s) onto the constraint set Q. Then Theorem 2.2.7 in [115] establishes that s £ 
argmin{P( fe )(a;(*0,s) : |js|j 7 < 5} if and only if d(s,Q) := ip — 7Tq(s)|| = 0. Thus, we set the stopping- 
condition for Algorithm APG as follows: 



ArUblUr . ^ • ; , . ■ ■ / iilax \ ~ > j t ' \ i /,-, 



h>l£l:=a max (A)^^\ or 

{3 g x € d x P^(x, s)\ v (p iV m with \\g x \\ 2 < 4 k \ and d (y?> , f (t) ) < rf'j, 



where i4 and v« are the components of v^' corresponding to x and s variables in (4.2 1 



Consequently, it follows that the stopping criterion APGstop ensures that the iterate (x^ k \ s^) satisfies 
one of the following two conditions 



(a) P( fe )(x( fe ),s«) <mi n:ceR „,|| s || T < A -P( fe )(x, S ) + e( fe ), 

(b) 3 g x k) e d x PM(x,s)\ x m, s w with || 3 i fe) || 2 < A k) , and d ( s « F«) < 



(4.3) 
With this modification, Theorem|3.1[ Corollary |3.2| Theorem [373] and Theorem [374] all remain valid for the re- 



laxed recovery problem (4.1 ). Thus, FAL efficiently computes a solution for the noisy recovery problem ( 1.3 1. 
The per iteration cost of FAL is the cost of computing A T {Ax — b). Thus, the per iteration complexity of 
FAL is always 0(n 2 ) for any A; whereas the per iteration complexity of NESTA 4J is C(n 3 ) when A is not 
orthogonal. Examples of non-orthogonal A include Gaussian measurement matrices, partial psuedo-polar 
Fourier tranforms, and non-orthogonal partial wavelet transforms. 

5. Implementation details of Algorithm FAL for numerical experiments. In this section we 
describe the details of the implemented version of Algorithm FAL that we used in our numerical experi- 
ments described in the next section. 

5.1. Initial iterate x^ and bound rj. In our numerical experiments, when A was a partial DCT 

matrix, we set a;( ) = argmin{ 1 1 x | 1 2 : Ax = b} = A T (AA T )~ 1 b = A T b, where the last equality follows from 
the fact that A has orthogonal rows. The complexity of computing a:*- ) in this case is 0(nlog(n)). Since 
x* G argmin{||a;||i : Ax — b} and Ax^ = b, we have ||a:*||i < |ja;' ' ) ||i, we use r\ = ||:c( )||i in FAL. 

For general A, the computational cost of computing A(AA T )~ 1 b is 0(n 2 + m 3 ). In order to avoid 0(m 3 ) 
inversion cost, we set a;*- ' = A T b when A was a standard Gaussian matrix, i.e. each element A+j is an 
independent sample from the standard A/"(0, 1). Since ||x^ ^||i = ||A T &||i is no longer an upper bound on 
||x*||i, we set r\ as follows. It well known that for an mxn standard Gaussian matrix <? m i n (A) w (l — */^) \fn 
for large n (in fact, the approximation is very accurate even at n = 100) |14j . Then, 

IWIi < ||^ r (^ T )- 1 6||i < -^T-Jb\\ 2 < r, := 3 -^=||6|| 2 . (5.1) 



5.2. APGstop and FALstop conditions. When the ALGORITHM APG terminates with £ > 4„ we 
return u"'. Since gradient computation is computationally the most expensive step in Algorithm APG, 
and we are required to compute the gradient at the v iterates, we checked the sub-gradient stopping con- 
dition at the these iterates. Hence, we stopped Algorithm APG and returned «" when min{|| (7||| : g <E 
dPW(x)\ vW } < r( fc ) . 

g e dP {k) (x)\ vW if, and only if, there exists q e d\\x\\i \ vW such that g = \ ( ^q + V/ (JM \ where V/ (fe ' i} := 
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V/W(>M). Thus, it follows that 

min{|| S ||l : g € dP^(x)\ vW ] = min{||A«g + V/^||* : q € dM, \ vW } 

= J2 |A W +V 4 V/f'' ) | 2 + £ |-A (fe) +V/f^| 2 

{i:v { / } >0} {i:v\ e> <0} 

+ £ min lA^ + Vjf^ 2 , 

{i:u< °=0} 

= E |A (fe) +V/f'°| 2 + £ |-A (fc) +V/P| 2 

{»:t;, W >0} {i:-uf'<0} 



+ X minlUV/i^l-AW^O}, 

where V/} ' denote the i-th component of the gradient Vf( k ' 1 '. The first equality follows from the fact 
qi = +1 (resp. —1) whenever v\ ' > (resp. v\ ' < 0), and q t e [— 1, 1] for i such that u] ' = 0, and the last 
equality follows from explicitly computing the minimum. Thus, given V/*- fe '", the complexity of computing 
g is 0(n). 

We used different stopping conditions depending on the existence of measurement noise. First, we modified 
the stopping condition APGstop as follows 

APGstop = FALstop or U. > £^\ or hg e dP {k) (x)\ v(e) with ||p|| a < r (fe) } , 

Then, we set the FAL stopping condition FALstop as follows, 
(i) Noiseless measurements: 

FALstop = {||u w - u ( ' _1) ||oo < 7>, (5.2) 

where we set the threshold 7 by experimenting with a small instance of the problem. Algorithm FAL 
produces x so i — u^ when FALstop is true. 
(ii) Noisy measurements: 

FALsto ^{ n»«"'ih - T J' (5 ' 3) 

where 7 was set equal to the standard deviation of the noise, i.e. when b = Ax* + £ such that £ is a 
vector of i.i.d. random variables with standard deviation g, we set 7 = g. As in the noiseless case, we 
terminated FAL and set x so i = u^ when FALstop is true. 

5.3. Parameter sequence selection. Recall that we require 

A«\0, e«\0, j^y^Bu r« \ 0, ^ 0, r« < «« 



for Theorem |3.1[ Corollary |3.2[ Theorem |3.3| and Theorem |3.4| to hold. These conditions are satisfied if one 
updates the parameters as follows: for k > 1, 

A( fe+1 ' = c A A«, r( fe+1 ) = min{ CT r«,c T Us^lW, ^ +1) = c 2 A ^\ 

where g^ = argmin{||A^ fe ^q + V/^ fc ^ C 33 )lll : 9 € 9||x||i | x (*-i)} is the minimum norm sub-gradient at 
the initial iterate x^ k ~ x ' for the fc-th sub-problem for k > 1, and c\, c T and c\ are appropriately chosen 
constants in (0, 1). Note that we still have to set the initial iterates A*- 1 ', T^- 1 ', and e*- 1 ^. 
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In our preliminary numerical experiments with FAL we found that it was sufficient to set c T = 0.9, and 
the optimal choice for the constants c\ and c T was only a function sparsity ratio £ = ||x*||o/?n, and was 
effectively independent of the problem size n. Moreover, the optimal choice for c T = c\ — 0.01. We set 



ca(0 



0.9, if C > 0.9; 

0.85, if 0.9 > £ > 0.6; 

0.8, if 0.6 >£> 0.25; 

0.6, if 0.25 >£> 0.1; 

0.4, if £< 0.1, 



(5.4) 



and approximated the sparsity ratio £ at the beginning of fc-th FAL by ^ k ' = \\x^ k l '\\o/m- Since x^ ' is set 
arbitrarily, and is unlikely to be sparse, we use the following parameter update rule for k > 2, 



_(fc) 



nn{c T ^ k) )r^\c T \\g^\\ 2 }, A« = c A fc«) A'*" 1 ), 



,0+1) 



c\ 



(C (fe) ) 



2 Ak) 



(5.5) 



0.9. We set c\ and c T for each problem class separately. 



where c T (£ (fc) ) = c A (£( fe )) - 0.01, and c r 

We set the initial parameter values A*- 1 ), r' 1 ) and e^ 1 - 1 as follows. Let x^ — A T b denote the initial FAL 
iterate. We set 



-(i) 



ll^lk 



A' 1 ) =0.99||x (0) || c 



We use the duality gap at x^ to set e^. Since the bound r/ 1 ' = rj is set to ensure that aigmm xeR P^ (x) e 
i* 1 ' ) = {x : ||x||i < ir'}, we are effectively computing an unconstrained minimum. Since P^'(x) > 
for all x e R™, it follows that the duality gap of the initial iterate x^ is at most P^(x^). We set 
eW = 0.99PW(# (0) )- Then set e^ 1 ) = (c A fc) ) V fe ) for all k > 1. 

The step length is Algorithm APG is proportional to 4 where L denotes the Lipschitz constant of V/. 
In our numerical tests, we observed that taking long steps, i.e. steps of size 4, for t > 1, improves the speed 
of convergence in practice. And we chose the step-size t as a function of the sparsity ratio ||#*||o/W In 
iteration k, we approximate the sparsity ratio by ^ k ' = ||a;' fc_1 ^||o/?Ti, and set the Lipschitz constant to be 
used in Algorithm APG to 



L ( k ) = <H 



C (AA T ) 



where the function 



m = 



1.8, 

1.85, 

1.9, 

2, 

3, 



*(£(*>) 



if C > 0.9; 

if 0.9 > ^ > 0.6; 

if 0.6 > ^ > 0.25; 

if 0.25 >£> 0.1; 
if £< 0.1. 



(5.6) 



6. Numerical experiments. We conducted three sets of numerical experiments with FAL. 

1. In the first set of experiments we solve randomly generated basis pursuit problems when there is no 
measurement noise. Our goal in this set of experiments were to benchmark the practical performance 
of FAL and to compare FAL with the Nesterov-type algorithms, SPA [2], and NESTA j4], fixed point 
continuation algorithms, FPC, FPC-BB [16l|T7], an d FPC-AS [23], the alternating direction proximal 
gradient method YALL1 [24j . and a root finding algorithm SPGL1 [55]. We describe the results for 
this set of experiments in Section |6.1.1| 

2. In the second set of experiments, we compare the performance of FAL with the performances of the 
same set solvers, on randomly generated basis pursuit denoising problems when there is a non-trivial 
level of noise on the measurement vector b. The results for this set of experiments is described in 
Section [6~2~T1 
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3. In the third set of experiments, we compare the performance and robustness of FAL with the same 
solvers on a set of small sized, hard compressed sensing problems, CaltechTest[S]. The results for 
this set of experiments is reported in Section |6.3[ 

All the numerical experiments were conducted on a desktop with 4 dual-core AMD Optcron 2218 @2.6 GHz 
processors, 16GB RAM running MATLAB 7.12 on Fedora 14 operating system. 

6.1. Experiments with no measurement noise. 

6.1.1. Signal generation. We generated the target signal x* and the measurement matrix A using 
the experimental setup in 0]. In particular, we set 



(^) J = i(zeA)ef ) i0 5e ' 2) , 



(6.1) 



where, 



(i) A is constructed by randomly selecting s indices from the set {1, . . . , n}, 
(ii) 0j- , i G A, are IID Bernoulli random variables taking values ±1 with equal probability, 
(iii) 0j , i G A, are IID uniform [0, 1] random variables. 

We then scale Q^ such that min^ O, • = and max^ 0,- = 1. Therefore, the signal x* has a dynamic range 
of 100dB. 

We randomly selected m = \ frequencies from the set {0, . . . , n} and set the measurement matrix A G sft" lXTl 
to the partial DCT matrix corresponding to the chosen frequencies. The measurement vector, b, is then set 
to the DCT evaluated at the chosen frequencies, i.e. b = Ax*. 



6.1.2. Algorithm scaling results. For this set of numerical experiments, 

FALstop = {\\u {e) - z*||oo < 7}, 



(6.2) 



and Algorithm FAL produces x so i = u^ when FALstop is true, where a;* is the randomly generated 
target signal. Since the largest magnitude of the target signal, i.e. max^ |(x*)i| is 10 5 , the stopping condition 
FALstop implies that x so i has 5 + log 10 (l/7) digits of accuracy. We report results for 7 = 1, 1CP 1 and 



10- 



For the first iteration of FAL, we set of 



(i) 



,.(i) 



0.4. For k > 2, we used the functions c\(-) 



and t(-) described in (5.4) and (5.6), respectively. The parameters (X^ k \r^ k \e^) were set as described in 
Section 15.31 



Sparsity 


7 


Table 


s = m/100 
s = m/100 
s = m/100 


1 
0.1 
0.01 


Table 
Table 
Table 


6.2 
6.4 


s = m/10 
s = m/10 
s = m/10 


1 

0.1 
0.01 


Table 
Table 
Table 


6.5 
6.6 
5T 



Table 6.1 
Summary of numerical experiments 

The Table [6T] summarizes the sparsity conditions and the parameter settings for this set of experiments. The 
column marked Table lists the table where we display the results corresponding to the parameter setting of 
the particular row, e.g. the results for s = m/10 and 7 = 0.1 are displayed in Table 6.6 



We generated 10 random instances for each of the experimental conditions. In Tables |6.2||6.7[ the column 
labeled average lists the average taken over the 10 random instances, the columns labeled max list the 
maximum over the 10 instances. The rows labeled Nfal arid Napg list the total number of FAL and 
APG iterations required, respectively, to solve the instance for the given tolerance parameter 7. The row 
labeled CPU lists the running time in seconds and the row labeled nMat lists the tot al num ber of matrix- 
vector multiplies of the form Ax or Ay computed during the FAL run. In Section 6.1.3 we report two 
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nMat numbers for FPC-AS: the first one is the number of multiplications with A during the fixed point 
iterations and the second one is the number of multiplications with a reduced form of A during the subspace 
optimization iterations. All other rows are self-explanatory. 

The experiment results support the following conclusions. FAL is very efficient - it requires only 11-20 
iterations to converge to an high accuracy solution of the basis pursuit problem. For a given sparsity level s 
and a stopping criterion 7, Nfal is a very slowly growing function of the dimension n of the target signal. 
The total number of matrix-vector multiplies increases with the number of non-zero elements in the target 
signal x* - increasing s from m/100 to to/10 increases the number of matrix-vector multiplies by 30%. On 
problems with high sparsity, FAL always recovers the support of the target signal. We find that FAL is 
always able to discover the support of the target signal when the tolerance 7 is set sufficiently low. 





n=512x512 


n=256x256 


n=64x64 




Average 


Max 


Average 


Max 


Average 


Max 


Napg 


27.0 


27 


26.5 


27 


29.3 


34 


1 x bo i lli - ||x»||i|/||x»||i 


1.70E-06 


2.12E-06 


3.96E-06 


1.12E-05 


1.29E-05 


2.62E-05 


max{|(x„0i -(x»)i| : (x*)i #0} 


3.23E-01 


3.78E-01 


5.73E-01 


1.00E+00 


9.09E-01 


1.00E+00 


max{|(x B0l )i| : (x.)i =0} 


0.00E+00 


0.00E+00 


0.00E+00 


0.00E+00 


0.00E+00 


0.00E+00 


Ax bo i — b 2 


0.703 


0.781 


0.799 


1.809 


0.604 


0.850 


l|x B ollll 


5588229.9 


7000555.1 


1508014.9 


1838186.7 


193826.1 


311446.4 


|x,||i 


5588239.3 


7000565.2 


1508021.0 


1838194.7 


193828.2 


311447.1 


CPU 


13.5 


13.7 


3.1 


3.1 


0.2 


0.3 


N FAL 


14.0 


14.0 


13.6 


14.0 


12.6 


14.0 


nMat 


56 


56 


55 


56 


60.6 


70 



FAL scaling results: m - 



Table 6.2 
■■ ra/4, s = m/100 and \\x so i 



< 1 





n=512x512 


n=256x256 


n=64x64 




Average 


Max 


Average 


Max 


Average 


Max 


Napg 


28.9 


29 


28.4 


29 


35.1 


45 


II|x bo i||i - ||x»||i|/||x»||i 


6.79E-08 


2.69E-07 


1.03E-07 


2.49E-07 


4.96E-07 


1.06E-06 


max{|(x 80 i)i - (x„)i| : (x»)i # 0} 


2.58E-02 


9.51E-02 


5.57E-02 


9.07E-02 


6.22E-02 


8.76E-02 


max{|(x B0l )i| : (x.)i =0} 


0.00E+00 


0.00E+00 


0.00E+00 


0.00E+00 


0.00E+00 


0.00E+00 


Ax bo i — b 2 


0.047 


0.187 


0.062 


0.109 


0.037 


0.065 


x bo i ||l 


5588239.4 


7000565.5 


1508020.8 


1838194.5 


193828.1 


311446.9 


x, ||l 


5588239.3 


7000565.2 


1508021.0 


1838194.7 


193828.2 


311447.1 


CPU 


14.5 


14.7 


3.3 


3.4 


0.3 


0.4 


Nfal 


14.9 


15.0 


14.4 


15.0 


14.0 


14.0 


nMat 


59.8 


60 


58.8 


60 


72.2 


92 



FAL scaling results: m ■■ 



Table 6.3 
n/4, s = -m/100 and \\x ao i 



< io- 





n=512x512 


n=256x256 


n=64x64 




Average 


Max 


Average 


Max 


Average 


Max 


Napg 


29.9 


30 


29.5 


30 


37.7 


49 


1 x bo i ||i - ||x»||i|/||x»||i 


6.36E-08 


9.54E-08 


4.77E-08 


6.85E-08 


4.57E-08 


1.66E-07 


max{|(x ao i)i - (x«)i| : (x*)i jt 0} 


7.66E-03 


9.27E-03 


6.92E-03 


8.60E-03 


5.63E-03 


9.63E-03 


max{|(x BO ,)i| : (x.)i =0} 


0.00E+00 


0.00E+00 


0.00E+00 


O.OOE+00 


0.00E+00 


0.00E+00 


Ax Bol — b 2 


0.027 


0.032 


0.013 


0.015 


0.004 


0.006 


l|x BO l||l 


5588239.7 


7000565.6 


1508021.0 


1838194.8 


193828.2 


311447.1 


ll x *l|i 


5588239.3 


7000565.2 


1508021.0 


1838194.7 


193828.2 


311447.1 


CPU 


15.0 


15.2 


3.4 


3.5 


0.3 


0.4 


Nfal 


15.0 


15.0 


15.0 


15.0 


14.7 


16.0 


nMat 


61.8 


62 


61 


62 


77.4 


100 



FAL scaling results: m ■■ 



Table 6.4 
n/4, s = m/100 and \\x so i — a;*||oo < 10" 



The worst case bound in Theorem 



3.4 



suggests that FAL requires 0{\) Algorithm APG iterations (or 
equivalently, matrix-vector multiplies) to compute an e-feasible and e-optimal solution to the basis pursuit 
problem. In our numerical experiments we found that we required only 4 ± 1 APG iterations per FAL 
iteration; therefore, we required only 0(log(-)) APG iterations to compute an e-optimal solution. In order 
to clearly demonstrate this phenomenon, we created 5 random instances of x* € W 1 and partial DCT matrix 

64 2 and to 



A& 



such that n 



as described in Section 



6.1.1 



Any a;* created contains \j^~\ nonzero 
components such that the largest and smallest magnitude of those components are 10 5 and 1, respectively. 
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n=512x512 


n=256x256 


n=64x64 




Average 


Max 


Average 


Max 


Average 


Max 


Napg 


28.9 


29 


28.2 


29 


27.2 


28 


|||x bo i||i - ||x»||i|/||x»||i 


6.82E-07 


2.51E-06 


2.01E-06 


3.31E-06 


4.93E-06 


8.35E-06 


max{|(x so i)i - (x»)i| : (x,) 1 jt 0} 


6.46E-01 


9.83E-01 


8.26E-01 


9.88E-01 


7.92E-01 


1.00E+00 


max{|(x so i)t| : (x»)i = 0} 


1.31E-01 


2.24E-01 


1.83E-01 


4.07E-01 


1.19E-01 


2.12E-01 


Ax BO | — b 2 


2.432 


4.802 


2.020 


2.441 


0.857 


1.203 


ll x soll|l 


56631758.9 


59669790.2 


14250619.6 


15030777.3 


1033569.1 


1289376.0 


llx«lll 


56631797.7 


59669841.3 


14250648.3 


15030813.1 


1033574.2 


1289377.9 


CPU 


14.5 


14.6 


3.3 


3.4 


0.2 


0.6 


N FAL 


14.9 


15.0 


14.2 


15.0 


13.8 


14.0 


nMat 


59.8 


60 


58.4 


60 


56.4 


58 



Table 6.5 
FAL scaling results: m = ra/4, s = m/10 and \\x ao i — x*||oo < 1 







n=512x512 


n=256x256 


n=64x64 




Average 


Max 


Average 


Max 


Average 


Max 


Napg 


33.7 


35 


32.7 


34 


31.2 


33 


ll|x B oll|l - l|x«||l|/ 


x.lll 


5.30E-07 


7.38E-07 


6.70E-07 


1.03E-06 


4.56E-07 


8.46E-07 


max{|(x so i)i - (x»)i| : ( 


x»)i#0} 


9.21E-02 


9.98E-02 


8.96E-02 


9.58E-02 


7.06E-02 


8.92E-02 


max{|(x so i)i| : (x.) 


= 0} 


1.17E-03 


6.17E-03 


2.38E-03 


1.26E-02 


7.16E-03 


2.40E-02 


Ax,„i — b 2 


0.601 


0.748 


0.357 


0.469 


0.087 


0.120 


X bo 1 ||l 


56631827.7 


59669880.2 


14250657.8 


15030821.1 


1033574.7 


1289378.1 


l|x»||i 


56631797.7 


59669841.3 


14250648.3 


15030813.1 


1033574.2 


1289377.9 


CPU 


16.8 


17.6 


3.8 


4.0 


0.2 


0.7 


N FAL 


17.1 


18.0 


16.7 


17.0 


15.7 


16.0 


nMat 


69.4 


72 


67.4 


70 


64.4 


68 



FAL scaling results: m ■■ 



Table 6.6 
: ra/4, s = m/10 and 



•^sol **'* I 



< 10" 





n=512x512 


n=256x256 


n=64x64 




Average 


Max 


Average 


Max 


Average 


Max 


Napg 


38.7 


39 


38.3 


39 


37.7 


39 


|||x bo1 ||i-||x»||i|/||x,||i 


4.94E-08 


5.80E-08 


4.32E-08 


5.54E-08 


2.16E-08 


5.06E-08 


max{|(x eo i),-(x»)i| : (x,)i#0} 


8.71E-03 


9.96E-03 


8.51E-03 


9.88E-03 


7.19E-03 


9.41E-03 


max{|(x sol )i| : (x„), =0} 


1.58E-04 


1.11E-03 


O.OOE+00 


0.00E+00 


0.00E+00 


0.00E+00 


Ax BO | — b 2 


0.069 


0.084 


0.038 


0.042 


0.010 


0.011 


XsolHl 


56631794.9 


59669838.5 


14250647.7 


15030812.4 


1033574.2 


1289377.9 


x» ||l 


56631797.7 


59669841.3 


14250648.3 


15030813.1 


1033574.2 


1289377.9 


CPU 


19.3 


19.5 


4.4 


4.5 


0.3 


0.7 


N FAL 


19.7 


20.0 


19.3 


20.0 


18.8 


19.0 


nMat 


79.4 


80 


78.6 


80 


77.4 


80 



FAL scaling results: m ■■ 



Table 6.7 
: ra/4, s = m/10 and \\x so i — x*||, 



< 10" 



We solved this set of random instances with Algorithm FAL using FALstop = {\\u^ - u^ _1 )||oo < 
5 x 10 -11 }. As before, let Nfal denote the number of FAL iterations required to compute x so i satisfying 
the stopping condition FALstop. Let N^ k ^ be the number of Algorithm APG iterations done on the fc-th 
call until the inner stopping condition (3.2 1 holds and Napg = Sfc=i L N^ k ) be the total number of inner 
iterations, i.e. total number of APG iterations, to compute a; so ;. 

For all five instances N FA l ~ 45, N A pg ~ 95, maxi<,-<„{|(a; ao f)» - {x*)i\ : (x*)i ^ 0} w 7 x lCT 11 , 
maxi<i<„{|(x so ;)i| : (x*)j = 0} = and ||Ar so i — b\\2 ~ 1 x 10 -10 . These numbers show that each output 
x so i is 15 digits accurate and very close to feasibility. 

Let u {k ^ denote u (£) iterate on the fc-th APG call. For any 1 < k < N F al and 1 < I < N^ k \ define 

we plot the relative error, relative feasibility and relative optimality 



X; 



6.1 



i„ := u ( ' > . In Figure 

of the inner iterates x^J as functions of Algorithm APG cumulative iteration counter j e {!,..., Napg} 



From the plots in Figure 6.1 it is clear that, in practice, the complexity of computing an e-feasible, e-optimal 



iterate is C(log(l/e)), as opposed to the 0(l/e) worst case complexity bound established Theorem 3.4 
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Fig. 6.1. Relative solution error, feasibility and optimality vs APG iterations 
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6.1.3. Comparison with other solvers. In this section, we report the results of our numerical ex- 
periments comparing FAL with SPA 0, NESTA vl.l g] [http : //www . ac niTcaltech . edu/~nesta7] , FPC 
and FPC-BB from FPC v2.0 [16, 17] |http : //www . caam . rice . edu/-optimization/Ll/f pc/| , FPC- AS 



vl.21 H2 http : //www .caam. rice .edu/~optimization /Ll/FPC_AS/|, YALL1 vl.4 [2"4] http: //www. yalll.| 
blogs.rice.edu and SPGL1 vl.7 |^2) http://www.cs.ubc.ca/labs/scl/spgll/]. We set the parameter 
values for each of the six solvers so that they all produce a solution with £oo-error approximately equal to 
5 x 10~ 4 , i.e. ||a; so z — x*||oo «5x 1CP 4 . This criterion results in the following set of parameters (all other 
parameters not mentioned below are set to their default values). 

(a) FAL: We set 7 = 2.5 x 1(T 4 , and the initial update coefficients cl X) = 0.4, ci 1} = 0.4 and *W = 2. For 



k > 2, we used the functions c\(-) and £(■) des crib ed in (5.4) and (5.6|, respectively. The parameters 
(X^ k ',T^ k \ e^) were set as described in Section 



-5 jo) _ n o J 1 ) 



5.3 



(b) SPA: 7 = 5 x 10~ D , c\ ' = 0.2, c\ ' = 0.855, c £ = 0.8, c A = 0.9 and c M = c v = 0.1. For details on these 
parameters refer to [2]. 

(c) NESTA: jjl = lxl0~ 4 and 7 = lxl0~ 10 . NESTA solves mm.\\ Ax _ b \\ 2 <sPy.{x), where p^{x) = max{i T «- 

f IMIl : ||u|U < !}■ NESTA terminates when lp " {x _ T^ffi — - < 7, for some 7 > 0, where p M (a; (fc) ) = 

10. fc} Z^=l P^K- 1 )• 



III I I! 



{10, fc} 



(d) FPC and FPC-BB: A = 1.5 x 10 4 . FPC and FPC-BB solve min^,, ||x||i + \\\Ax - b\\ 2 2 . 

(e) FPC-AS: A = 7.5 x 10~ 5 . FPC-AS solves min xe sRn A||jc||i + \\\Ax - b\\\. 

(f) YALL1 (BP): 7 = 2 x 10~ 9 and nonorth — 0. YALL1 (BP) algorithm solves the basis pursuit problem 
min^gsRnlUxlli : Ax — b} and terminates when if^" 1 "^! < 7- nonorth — indicates that AA T = I. 

(g) SPGL1 (BP): optTol = 5x 10~ 3 and bpTol = 1 x 10" 6 . SPGL1 (BP) algorithm solves the basis pursuit 
problem mina;gsj}r>{||:E||i : Ax = b}. For the optimality and basis pursuit tolerance parameters, optTol 
and bpTol, refer to [22]. 

The termination criteria for the different solvers were not directly comparable since the different solvers 
solve slightly different formulations of the basis pursuit problem. However, we attempted to set the stopping 
parameter 7 for FAL so that on average the stopping criterion for FAL was more stringent than any of the 
other solvers. 

We tested each solver on the same set of 10 random instances of size n = 512 x 512 that were generated using 
the procedure described in Section |6.1.1| The results of the experiments are displayed in Table |6.8| The 
experimental results in Table [678] show that FAL was six times faster than SPA and NESTA, approximately 
four times faster than FPC, and two times faster than FPC-BB and FPC-AS algorithms. Moreover, unlike 
the other solvers, for all 10 instances, FAL accurately identified the support of the target signal, without 
any heuristic thresholding step. This feature of FAL is very appealing in practice. For signals with a large 
dynamic range, almost all of the state-of-the-art efficient algorithms produce a solution with many small non 
zeros terms, and it is often hard to determine this threshold. 

6.2. Experiments with measurement noise. 

6.2.1. Signal generation. For this set of experiments the target signal x* € 5ft™ was generated as 
follows: (#*).* = l(i € A) Oi, where 

(i) the set A was constructed by randomly selecting s indices from the set {1, . . . , n}, 
(ii) 0^, i € A, were independently, and identically distributed standard Gaussian random variables. 

The measurement matrix A and the measurement vector b were constructed as follows. We set the number 
of observations m = \j] . Each element Aij were sampled IID from a standard Normal distribution. The 
measurement b = Ax* + (,, where each component Q € K m was sampled IID from a mean and variance g 2 
Normal distribution. Therefore, the signal to noise ratio (SNR) of the measurement b was 



SNR(6) = 10 log 10 ( E[||ClT^] ' = 10 l0Sl ° ' ^ ' ' (6 ' 3) 
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or equivalently, g 2 — slO SNR ( b )/ 10 . For each random x* and A, we considered SNR equal to 20dB, 30dB 
and 40dB. 





FAL 


FPC 


-AS 




Average 


Max 


Average 


Max 


ll|x BO l||l - llx.llil/llx.Hi 


2.6E-09 


3.2E-09 


3.5E-08 


3.6E-08 


max{|(x sol )i - (x.)i| : (x,), ? 0} 


5.1E-04 


6.2E-04 


6.5E-04 


7.1E-04 


max{|(x sol )i| : (x,)i =0} 








1.2E-04 


1.5E-04 


Ax so i — b 2 


3.7E-03 


4.4E-03 


1.2E-02 


1.2E-02 


X B ol 1 


56631797.8 


59669841.4 


56631795.7 


59669839.3 


|| X* 111 


56631797.7 


59669841.3 


56631797.7 


59669841.3 


CPU 


11.0 


12.3 


22.2 


23.9 


nMat 


98 


99 


109 / 205.6 


109 / 208 










SPA 


NESTA 




Average 


Max 


Average 


Max 


|||x BO l||l - ||x»||l|/||x»||l 


1.0E-08 


1.1E-08 


6.5E-08 


6.7E-08 


max{|(x 30 ,),-(x # )i| : (x,)i#0} 


6.0E-04 


6.8E-04 


7.4E-04 


8.4E-04 


max{|(x sol )i| : (x,)i = 0} 


6.6E-05 


7.1E-05 


2.3E-04 


3.1E-04 


Ax so i — b 2 


6.0E-03 


6.3E-03 


4.0E-10 


4.1E-10 


l|x s oll|l 


56631798.3 


59669841.9 


56631801.4 


59669845.0 


|| X, ||l 


56631797.7 


59669841.3 


56631797.7 


59669841.3 


CPU 


67.3 


73.0 


72.1 


80.1 


nMat 


583.2 


587 


632.4 


636 










FPC 


FPC-BB 




Average 


Max 


Average 


Max 


1 x bo i i - ||x»||i|/||x»||i 


3.5E-08 


3.5E-08 


3.2E-08 


3.3E-08 


max{|(x 3oI ) 1 -(x # ) i | : (x») i7 i0} 


6.8E-04 


7.3E-04 


6.1E-04 


6.7E-04 


max{|(x so ,)i| : (x»)i =0} 


1.6E-04 


1.9E-04 


1.3E-04 


1.6E-04 


Ax so i - b 2 


1.2E-02 


1.2E-02 


1.1E-02 


1.1E-02 


l|x S ol||l 


56631795.7 


59669839.3 


56631795.9 


59669839.5 


|x,||i 


56631797.7 


59669841.3 


56631797.7 


59669841.3 


CPU 


40.4 


50.0 


22.7 


26.4 


nMat 


383.0 


387 


195.0 


195 










YALL1 (BP) 


SPGL1 




Average 


Max 


Average 


Max 


ll|x sol ||i-||x,||l|/l|x,||i 


9.4E-10 


1.4E-09 


3.2E-09 


6.7E-09 


max{|(x sol )i - (x»)i| : (x.)t jt 0} 


5.7E-04 


8.0E-04 


5.3E-04 


7.6E-04 


max{|(x so ,)i| : (x»)i =0} 


1.5E-19 


1.5E-19 


2.4E-04 


3.3E-04 


Ax so i - b 2 


4.4E-03 


5.5E-03 


4.2E-03 


4.9E-03 


X S ol 111 


56631797.7 


59669841.3 


56631797.5 


59669841.1 


l|x»||i 


56631797.7 


59669841.3 


56631797.7 


59669841.3 


CPU 


44.9 


53.3 


24.7 


28.4 


nMat 


453.0 


477 


200.7 


209 



Noiseless comparison tests: m ■■ 



Table 6.8 
: ra/4, s = m/10 and \\x so i — x*\\, 



5 x 10- 



6.2.2. Comparison with other solvers. For each noise level, we created 10 random instances of size 



n = 128 x 128 using the procedure described in Section 6.2.1 We stopped each algorithm when the relative 
^-distance of consecutive iterates are less than g, i.e. we impose the noisy stopping condition in Section [5. 2| 
for all the solvers. 



Some of the solvers we tested solve the penalty formulation min^gsR" ll^lli + t||Ae — ^lll- Hale et. al. [TB] 
proposed that when the measurement noise vector £ is a N(0, a) Gaussian vector, the penalty parameter A 

should be set to A = —f^rW V x ~n' m > where x\- a m denotes the 1 — a critical value of the \ 2 distribution 
with m degrees of freedom. We used the function getM_mu.m from FPC v. 2.0 package to compute A according 
to this formula. The other parameters were set as follows. 



1. FAL: We set the initial update coefficients c[ = 0.4, c T = 0.4 and t^ = 2. For k > 2, we used the 



(i) 



functions c\(-) and £(•) described in (5.4) and (5.6|, respectively. The parameters (X^ k ',T^ k ',e^ k ') were 
set as described in Section \KM 



2. SPA: c 



(0) 



0.2, c^ 



0.855, c £ = 0.8, c x = 0.9 and c u 



3. NESTA: NESTA solves min||^ :E _ b || 2 < l 5p A1 (x), where p M (x) = max{ru-^ 
and the model parameter 5 was set to \/m + 2\/2m g as described in [4] . 



0.1. See [2] the parameter definitions. 
#1H|| : |MU<1}. ^ = 2x10- 



4. FPC and FPC-BB: FPC and FPC-BB solve min x6 sRn 
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2X1 



Ax — &H2; A was set as described above. 



5. FPC-AS: FPC-AS solves min xe sR» A 

6. YALL1 (L1/L2): (L1/L2) option solves min xe sR» ||»||i • 

7. YALL1 (Ll/L2con): (Ll/L2con) option solves miniu 

set to v m 



l£ 3{" A||x||i + \\\Ax — fr||! an d ^ was set as described. 



2A 



Ax 

b\\ 2 <6 I 



6||| and A was set as described above. 



x i 



where the model parameter S was 



All the parameters other than ones explained above were set to their default values, 
experiments are displayed in Tables [679| - |6. 11} 



The results of the 





FAL 


FPC- 


AS 


FPC 


YALL1 


(L1/L2) 




Average 


Max 


Average 


Max 


Average 


Max 


Average 


Max 


||x sol -x»|| 2 /||x»|| 2 


0.007 


0.008 


0.007 


0.008 


0.012 


0.013 


0.008 


0.009 


max{|(x so i)i - (x.)i| : (x»)i jt 0} 


1.4E-02 


1.7E-02 


2.2E-02 


2.9E-02 


2.4E-02 


2.6E-02 


1.7E-02 


2.0E-02 


max{|(x sol )i| : (x„)i =0} 


1.1E-02 


1.3E-02 


9.0E-03 


1.1E-02 


1.4E-02 


1.6E-02 


1.4E-02 


1.6E-02 


Ax sol — b 2 


4.1E-02 


4.7E-02 


4.4E-02 


5.0E-02 


2.5E-02 


2.5E-02 


5.1E-02 


5.3E-02 


CPU 


13.1 


13.8 


18.9 


20.3 


156.9 


166.9 


31.8 


34.1 


nMat 


62.8 


65 


67.8/113.8 


71/117 


735.4 


769 


149.5 


157 





SPA 


NESTA 


FPC-BB 


YALL1 (Ll/L2con) 




Average 


Max 


Average 


Max 


Average 


Max 


Average 


Max 


||x so i - X»|| 2 /||X»||2 


0.011 


0.012 


0.019 


0.020 


0.012 


0.012 


0.013 


0.014 


max{|(x sol )i - (x»)i| : (x»)i # 0} 


2.3E-02 


3.5E-02 


4.1E-02 


4.5E-02 


2.3E-02 


2.6E-02 


2.2E-02 


2.6E-02 


max{|(x sol )i| : (x„)i =0} 


1.0E-02 


1.4E-02 


1.3E-02 


1.5E-02 


1.3E-02 


1.6E-02 


1.6E-02 


1.8E-02 


||Ax. i-b||a 


2.9E-02 


6.8E-02 


6.9E-02 


6.9E-02 


2.5E-02 


2.6E-02 


1.5E-02 


1.6E-02 


CPU 


66.8 


76.1 


264.1 


293.0 


39.4 


43.7 


28.5 


30.8 


nMat 


326.6 


375 


536.0 


553 


180.8 


189 


137.0 


137 


PreprocessTime 


N/A 


N/A 


581.7 


667.8 


N/A 


N/A 


N/A 


N/A 



Table 6.9 
Noisy comparative tests: SNR(b)=40dB 





FAL 


FPC-AS 


FPC 


YALL1 (L1/L2) 




Average 


Max 


Average 


Max 


Average 


Max 


Average 


Max 


||x sol -x»|| 2 /||x»|| 2 


0.024 


0.027 


0.023 


0.027 


0.036 


0.038 


0.031 


0.033 


max{|(x sol )i - (x»)i| : (x,), ^ 0} 


5.4E-02 


6.1E-02 


5.8E-02 


7.4E-02 


7.3E-02 


8.0E-02 


6.0E-02 


6.9E-02 


max{|(x sol )i| : (x„), =0} 


3.7E-02 


4.0E-02 


3.8E-02 


4.2E-02 


4.1E-02 


4.9E-02 


4.1E-02 


4.7E-02 


Ax BO | — b 2 


1.2E-01 


1.3E-01 


1.0E-01 


1.1E-01 


7.8E-02 


7.9E-02 


1.1E-01 


1.2E-01 


CPU 


10.8 


11.0 


19.8 


22.2 


90.1 


101 


21.7 


22.3 


nMat 


51.8 


53 


72.4/118.4 


79/125 


436.8 


493 


106.0 


107 





SPA 


NESTA 


FPC- 


BB 


YALL1 (Ll/L2con) 




Average 


Max 


Average 


Max 


Average 


Max 


Average 


Max 


X BO i - X» 2 / ||x» 2 


0.023 


0.025 


0.078 


0.082 


0.035 


0.036 


0.036 


0.038 


max{|(x ao i)i - (x.)i| : (x»)i ^ 0} 


5.1E-02 


5.7E-02 


1.7E-01 


2.0E-01 


6.9E-02 


7.6E-02 


6.0E-02 


6.8E-02 


max{|(x BO ,)i| : (x,)i =0} 


3.2E-02 


3.9E-02 


6.2E-02 


7.3E-02 


3.9E-02 


4.6E-02 


4.5E-02 


5.3E-02 


Ax,„i — b 2 


1.1E-01 


1.1E-01 


2.2E-01 


2.2E-01 


7.8E-02 


7.9E-02 


3.6E-02 


3.8E-02 


CPU 


55.4 


58.6 


207.1 


221.1 


25.8 


27.9 


20.6 


21.5 


nMat 


267.8 


287 


354.0 


363 


122.4 


135 


103.5 


107 


PreprocessTime 


N/A 


N/A 


581.7 


667.8 


N/A 


N/A 


N/A 


N/A 



Table 6.10 
Noisy comparative tests: SNR(b)=30dB 





FAL 


FPC- 


AS 


FPC 


YALL1 


(L1/L2) 




Average 


Max 


Average 


Max 


Average 


Max 


Average 


Max 


Ilx.0x-x.ll3/llx.lla 


0.090 


0.103 


0.099 


0.105 


0.104 


0.111 


0.100 


0.107 


max{|(x sol )i-(x»)i| : (x„)i#0} 


2.0E-01 


2.4E-01 


2.0E-01 


2.3E-01 


2.1E-01 


2.4E-01 


1.9E-01 


2.2E-01 


max{|(x so ,)i| : (x„), =0} 


1.3E-01 


1.9E-01 


1.2E-01 


1.4E-01 


1.2E-01 


1.4E-01 


1.3E-01 


1.6E-01 


Ax bo i — b 2 


3.4E-01 


5.1E-01 


2.8E-01 


2.9E-01 


2.5E-01 


2.6E-01 


2.9E-01 


2.9E-01 


CPU 


8.3 


8.6 


22.6 


23.9 


71.2 


74.5 


13.8 


14.1 


nMat 


39.6 


41 


78.8/124.8 


83/129 


349.6 


365 


67.0 


67 





SPA 


NESTA 


FPC- 


BB 


YALL1 (Ll/L2con) 




Average 


Max 


Average 


Max 


Average 


Max 


Average 


Max 


||x Bol -x.|| 2 /||x.|| 2 


0.108 


0.116 


0.251 


0.267 


0.100 


0.109 


0.124 


0.132 


max{|(x sol )i - (x.)i| : (x.), ^ 0} 


2.0E-01 


2.4E-01 


5.4E-01 


6.1E-01 


2.0E-01 


2.3E-01 


2.1E-01 


2.3E-01 


max{|(x so i)i| : (x„)i = 0} 


1.3E-01 


1.5E-01 


1.8E-01 


2.1E-01 


1.2E-01 


1.4E-01 


1.6E-01 


1.8E-01 


Ax BO | — b 2 


1.4E-01 


1.4E-01 


6.9E-01 


6.9E-01 


2.5E-01 


2.5E-01 


1.4E-01 


1.6E-01 


CPU 


54.8 


58.9 


170.1 


176.4 


23.8 


27.0 


15.4 


16.1 


nMat 


268.0 


289 


225.0 


233 


104.2 


109 


76.5 


77 


PreprocessTime 


N/A 


N/A 


581.7 


667.8 


N/A 


N/A 


N/A 


N/A 



Table 6.11 
Noisy comparative tests: SNR(b)= 



■20dB 
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The results clearly show that FAL is faster then the other state-of-the-art algorithms over the SNR range 
20dB-40dB. Since nMat only keeps tracks of matrix- vector multiplies with the full m x n measurement 
matrix, the CPU time CPU for some of the solvers is not completely determined by nMat. For instance, 
at the 40dB SNR level, FAL computed 62.8 and FPC-AS computed 67.8 matrix-vector multiplications on 
average; but the average CPU time for FAL was 13.1 sees, whereas it was 18.9 sees for FPC-AS. This 
difference in the CPU time is due to smaller size matrix-vector multiplies that FPC-AS computes during 
subspace optimization iterations. NESTA has the highest overhead cost: it needs SVD of A = UYiV T at the 
beginning since Gaussian A does not satisfy AA T = I. The preprocess time reported for NESTA shows the 
time to compute the SVD of A. Moreover, on top of the reported number of matrix vector multiplications 
with m x n matrices, NESTA also computes 2 matrix vector multiplications with m x m matrices at each 
iteration, which is not reported. 

6.3. Comparison with other solvers on hard instances. In order to demonstrate the robustness 
of FAL, we tested it on the Caltech test problems: CaltechTestl, CaltechTest2, CaltechTest3 and 

CaltechTest4 5j. This is a set of small-sized hard instances of CS problems. The hardness of these 



instances is due to the very large dynamic range of the nonzero components (see Table 6.12). For example, 
the target signal x* £ K 512 in CaltechTestl has 33 nonzero components with magnitude of 10 5 and 5 
components with magnitude of 1, i.e. x* has a dynamic range of lOOdB. 



problem 


n 


m 


s 


(magnitude, # elements of this magn 


tude) 


CaltechTestl 


512 


128 


38 


(10 a ,33), (1,5) 


CaltechTest2 


512 


128 


37 


(10 b ,32), (1,5) 


CaltechTest3 


512 


128 


32 


(10-^,31), (10- b ,l) 


CaltechTest4 


512 


102 


26 


(10 4 ,13), (1,12), (10~ J , 1) 



Table 6.12 
Characteristics of The Problems in CaltechTest Problem Set 



for CaltechT- 



The Caltech problems have measurement noise. However, the SNR(6) = 20 log 10 I " m>|| 
estl-CaltechTest4 problems is 228dB, 265dB, 168dB and 261dB, respectively. Since the SNR values are 
very high, we solved this set of problems solve them via basis pursuit formulation (1.1) using FAL, SPA, 
NESTA, YALL1 and SPGL1; and via unconstrained basis pursuit denoising formulation (1.2) with small 
A > values using FPC, FPC-BB and FPC-AS. 

For FAL, SPA, NESTA vl.l, FPC and FPC-BB that come with v2.0 solver package, FPC-AS vl.21, 
YALL1 vl.4 and SPGL1 vl.7, we chose parameter values that produced a solution x so \ with high accu- 
racy in reasonable time. 



5 x 10 9 and the initial update coefficients c 



(i) 



0.8. 



5.3 



1. FAL: We set 7 

k > 2, we used the functions c\(-) and £(•) de scrib ed in (5.4) and (5.6) 
(A^j-rC^e^)) were set as described in Section 

2. SPA: 7 = 1 x 10~ 8 , c-r ] = 0.1, c^ = 0.76, c e = 
parameters refer to [2J. 

- & and 7 = 1 x 10~ 16 



c [ r ] = 0.8 and *W = 1.9. For 
respectively. The parameters 



c\ = 0.95 and Cu = c v = 0.4. For details on these 



See Section 6.1.3 for the definition of n and 7. 



3. NESTA: /j = 1 x 10" 

4. FPC and FPC-BB: \ = 1 X 10 lu , xtol = lO" 10 , gtol = 10~ 8 and mxitr = 20000, where xtol, gtol set 
the termination conditions on the relative change in iterates and gradient, respectively, and mxitr is the 
iteration limit allowed. See Section f6. 1.31 for the definition of A. 

5. FPC-AS: A = 1 x 10 -10 and gtol = 10 -14 , where gtol is the termination criterion on the maximum norm 
of sub-gradient. See Section [6. 1.3 for the definition of A. 

6. YALL1 (BP): 7 = 1 x 10~ n and nonorth — 0. See the item describing YALL1 (BP) for the definition 
of of 7 and nonorth in Section |6.1.3| 

7. SPGL1 (BP): optTol = 1 x 10 7 , bpTol = 1 x 10~ 9 and decTol = 1 x 10~ 7 . For the details on optimality 
and basis pursuit tolerance parameters, optTol, bpTol and decTol, refer to |22) . 

These parameter values were fixed for all 4 test problems and all other parameters are set to their default 
values. The termination criteria were not directly comparable since the different solvers use a slightly different 
formulation of the basis pursuit problem. However, we attempted to set the stopping parameter 7 such that 
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Problem 


Solver 


llx.lk 


ll X SOl||l 


rel.err 


inf.err+ 


m/.erro 


IMIa 


CPU 


nMat 


nnz 




FAL 




3300005.00 


5.15E-12 


9.94E-07 





1.16E-08 


0.598 


1715 


38 




SPA 




3300005.00 


1.85E-10 


3.05E-05 


1.68E-05 


4.85E-06 


7.783 


20305 


512 


Caltechl 


NESTA 


3300005 


3300005.00 


2.43E-10 


4.01E-05 


2.18E-05 


1.06E-10 


9.902 


18432 


512 


FPC 


3300002.44 


3.05E-06 


5.17E-01 


2.64E-01 


1.78E-01 


22.509 


40001 


109 




FPC-BB 




3300002.44 


8.46E-06 


5.16E-01 


2.63E-01 


1.78E-01 


44.600 


40001 


109 




FPC- AS 




3300005.00 


5.15E-12 


9.97E-07 


8.97E-10 


1.62E-09 


0.375 


109 / 393 


63 




YALL1 




3300005.00 


5.61E-11 


8.51E-05 


1.19E-18 


6.09E-05 


5.486 


14492 


276 




SPGL1 




3300005.11 


1.19E-06 


1.20E+00 


6.53E-01 


9.94E-08 


9.989 


17705 


171 




FAL 




3200005.00 


7.17E-14 


1.41E-08 





7.03E-09 


0.358 


971 


37 




SPA 




3200005.00 


1.19E-10 


2.04E-05 


1.38E-05 


4.73E-06 


5.651 


14001 


512 


Caltech2 


NESTA 


3300005 


3200005.00 


1.24E-10 


2.10E-05 


1.47E-05 


9.34E-11 


3.826 


7204 


512 


FPC 


3200004.97 


2.15E-08 


3.72E-03 


2.32E-03 


2.39E-03 


23.540 


40001 


96 




FPC-BB 




3200004.47 


3.43E-07 


5.92E-02 


3.70E-02 


3.82E-02 


43.019 


40001 


96 




FPC- AS 




3200005.00 


7.58E-14 


1.78E-08 


2.03E-09 


1.88E-09 


0.222 


127 / 407 


63 




YALL1 




3200005.00 


6.29E-11 


1.01E-04 


1.15E-18 


5.76E-05 


1.337 


3137 


275 




SPGL1 




3200005.00 


1.35E-11 


1.35E-05 


8.15E-06 


6.96E-08 


16.413 


28008 


212 




FAL 




6.20000101 


4.03E-08 


1.49E-08 





1.35E-08 


0.166 


359 


32 




SPA 




6.19999388 


5.82E-06 


1.85E-06 


8.99E-07 


8.78E-07 


4.663 


9767 


512 


Caltech3 


NESTA 


6.200000974 


6.20007451 


5.02E-05 


1.51E-05 


8.72E-06 


1.96E-16 


5.131 


8326 


512 


FPC 


6.20000076 


6.50E-08 


2.01E-08 


1.04E-08 


1.80E-08 


27.730 


40001 


78 




FPC-BB 




6.19975503 


7.09E-05 


2.22E-05 


1.06E-05 


1.84E-05 


37.365 


40001 


80 




FPC-AS 




6.20000098 


1.46E-09 


3.78E-10 


4.73E-10 


1.23E-09 


0.137 


93 / 271 


67 




YALL1 




6.30373200 


8.53E-02 


1.47E-01 


1.19E-01 


3.95E-16 


23.048 


50002 


321 




SPGL1 




6.20000438 


4.14E-06 


6.62E-06 


5.92E-06 


9.99E-08 


8.275 


11885 


131 




FAL 




130012.010 


1.28E-12 


2.16E-08 





1.24E-08 


0.207 


487 


26 




SPA 




130012.010 


3.80E-09 


4.92E-05 


1.86E-05 


1.15E-05 


3.788 


8221 


512 


Caltech4 


NESTA 


130012.01 


130012.010 


1.87E-09 


2.37E-05 


9.61E-06 


5.71E-12 


3.583 


5904 


512 


FPC 


130012.008 


2.01E-08 


2.62E-04 


9.07E-05 


1.39E-04 


26.283 


40001 


71 




FPC-BB 




130010.234 


1.92E-05 


2.39E-01 


7.46E-02 


1.39E-01 


44.804 


40001 


62 




FPC-AS 




130012.010 


8.31E-13 


9.01E-09 


8.62E-09 


3.86E-09 


0.270 


145 / 523 


50 




YALL1 




130012.010 


8.99E-11 


5.34E-06 


7.03E-20 


3.64E-06 


4.747 


10682 


305 




SPGL1 




130012.010 


9.57E-11 


4.42E-06 


2.40E-06 


9.97E-08 


9.641 


14647 


97 



Table 6.13 
CaltechTest Problem Set 



on the average the stopping criterion for FAL was more stringent than those for the other algorithms we 
tested. The results of the experiments are displayed in Table [67T3] In Table [6T3} the row labeled CPU lists 
the row labeled rel.err lists the relative error the solution, i.e rel.err = iu~|| , the row labeled inf.err + 
lists the absolute error on the nonzero components of x*, i.e inf.err + = max{|(a: so ;)i — (#*)i| : (%*)i y^ 0}, 
the row labeled inf .err lists the absolute error on the zero components of x* , without any thresholding or 
post-processing. None of the solvers, other than FAL, were able to identify the true support of the target 
signal for any of the CaltechTest instances. 

7. Conclusion. We propose a first-order augmented lagrangian algorithm (FAL) for basis pursuit. FAL 
computes a solution to the basis pursuit problem by solving a sequence of augmented lagrangian subproblems, 
and each subproblem is solved using a variant of the infinite memory proximal gradient algorithm (Algorithm 
3) [21] • We prove that FAL iterates converge to the optimal solution of the basis pursuit problem whenever 
it is unique, which is true with overwhelming probability for compressed sensing problems (In [7] Candes 
and Tao have shown that for random measurement A matrices the resulting basis pursuit problem has 
a unique solution with very high probability). We are able to prove FAL needs at most C(e _1 ) matrix- 
vector multiplies to compute an e-feasible and e-optimal solution. However, in our numerical experiments 
we observe that we only need O(log(e -1 )) matrix- vector multiplies! We found that for a fixed measurement 
ratio m/n, sparsity ratios s/n, and solution accuracy 7, the number of matrix-vector multiplies computed 
by FAL were effectively independent of the dimension n. This allows us to tune the algorithm parameters 
on small problems and then use these parameters for all larger problems with the same measurement and 
sparsity ratios. The numerical results reported in this paper clearly show that FAL solves both the noise-less 
and noisy versions of the compressive sensing problems very efficiently. 

8. Acknowledgments. We thank the anonymous referees for their insightful comments that signifi- 
cantly improved both the algorithm and the paper. We also thank Professor Y. Zhang for helping us better 
understand the capabilities of YALL1. 
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Appendix A. Auxiliary results. 

Theorem A.l. Let f : W l — > R be a convex function. Suppose the V/ is Lipschitz continuous with the 
Lipschitz constant L. Fix e > 0. Suppose x <E R™ satisfies A||x||i + f(x) — (A||a;*||i + f{x*)) < e, where 
x* e argmin{Aj|.xj|i + f(x) : x € R"}. Then 

^ E (|V/^)|-A) 2 <6. (A.l) 

i:|V/«(S)|>A 



The bound (A.l) implies \\^ f(x)\\ co < v 2Le + A. 



Proof. Triangular inequality for ||.||i and Lipschitz continuity of V/ implies that for all y € 

A|M|i + f(y) < A||S||i + /(£) + V/(s) T (y - s) + ^\\y - x\\j + X\\y - x\\i. 
Taking the minimum with respect to y, we get 

A||xli + f(x*) < \\\x\\i + f(x) + min \vf{x) T {y - x) + hy - xg + X\\y - x\U } ( A.2] 

ySR" I Z 



Let w = V/(x). Then 



y* = argmiiJ w T {y-x) + -\\y - x\\j + X\\y - x||i \ , (A.3) 

j/eR™ I z J 

(A.4) 
(A.5) 
(A.6) 



• f 1 II W ll2 

argmnW -\\y - x + -\\ 2 
yew I z -k 


A i 


y-»lli > 


x + sign 1 — — 1 max < 


— w 
L 


-H 


— sign(w) 
.r H v ; max{|w| 


-A, 


o}. 



where (A.5) follows from the fact that argmin zSR „{j/|jz|ji + ^\\z — C|||} = sign(^) max{|£| — ^,0}, where 
is component-wise multiplication operator |17j . and all other vector operators such as | • |, sign(-) and 
max{-, •} are defined to operate component-wise. Substituting y* in (|A.2[), we get 



min \ w T (y-x) + -\\y - x\\j + X\\y - x\ 
iyGR™ |_ 2 

-Jj max{K| -A,0} + ^ max{K| - A, 0} 2 + - E max{K| - A, 0}, 



Y E f-KI + J(KI-A) + A)(K|-A), 



L 

i:\wi\>\ 



l - E (Kl-A) 2 . (A.7) 



2L 

i:\wi\>\ 



The bound (A.l) follows from the fact A||x||i +f(x) — (A||a;*||i + f(x*)) < e. The bound (A.l) clearly implies 
that \wi\ < \/2Le + X for all i, i.e. HV/^Jloo < V2Le + X. D 

Corollary A. 2. Suppose A e M mx " with m < n and full rank. Let P(x) = X\\x\\\ + ^\\Ax - b - X6\\1- 
Suppose x is e- optimal for mm^R™ P(x), i.e. < P{x) — min x gnu P(x) < e. Then 

||A T (Ax-6-A0)|| oo < VTe a max {A) + A, 

\\Ax-b-X9\\ 2 < -^ (Jle a max (A) + X) , (A ' 8j 
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where o~ max (A) denotes the maximum singular value of A. 

Proof. Let f(x) = \\\Ax -b- A0||§, then V/(x) = A T (Ax -b- X6). For any x,y E R", we have 

||V/(x) - V/(y)|| a = \\A T A(x - y)\\ 2 < a 2 max (A)\\x - y\\ 2 , 

where o- max (A) is the maximum singular- value of A. Thus, / : W 1 —¥ 3i is a convex function and V/ is 
Lipschitz continuous with the constant L — cr 2 nax {A). 



Since x is an e-optimal solution to min xg Rn P(x) = mm xe R»{A||x||i + /(x)}, Theorem A.l immediately 
implies the first bound in (A.8). The second bound follows from the fact that 

|L4x-fr-A0|| 2 < " AT( f -?-**)»' <^* m As-l>-mU 

where the first inequality follows the definition of cr m in(^4) and the second from the bound \\y\\ 2 < V n ll2/ll°° 
for all y£W\ □ 

Lemma A. 3. Let f : R™ — > R be a strictly convex function and S C R™ be a closed, convex set. Let 
x* s — argmin^gg f{%) and x* = argmin^gjjn f{x). Suppose the unconstrained optimum x* SjL S, then x* s £ dS, 
where OS denotes the boundary of the set S. 

Proof. We will establish the result by contradiction. Suppose x* s £ int(S). Then, there exists an e > such 
that B e — {x £ E™ : \\x - x* s \\ 2 < e} C S. Since / is strictly convex and x* ^ x* s , f(x*) < f(x* s ). 

Fix < A < |i s _^.|| < 1. Then x x = Xx* + (1 - X)x* s £ B t C S. Since / is strictly convex, 

f(x x ) < A/Or*) + (1 - X)f(x) < f(x* s ). (A.9) 

This contradicts the fact that x% = argmin a , eS {/(cc)}. Thus, x* s £ 5\int(S') = dS. D 
Lemma A. 4. Fix y £ R n , A > and r\ > 0. Let P(x) = X\\x\\\ + |||x — y\\ 2 and 

x* = argmin{P(x) : ||x||i < f]}. (A. 10) 

Then the deterministic complexity of computing x* is O(nlog(n)), and the randomized complexity is 0(n). 



Proof. Since P (x) is strongly convex, (A. 10) has a unique primal optimal solution. Also, since the opti- 
mization problem (A. 10) satisfies Slater's constraint qualification, strong duality holds, and since the primal 
optimal value bounded, the dual optimal value is attained. 

Let 

C(x,a) = A||a?||i + ^\\x-y\\l + a{\\x\\i-r]), (A.ll) 

= (A + a)||a;||i + ^||a:-i/||l-a»7, (A.12) 

denote the Lagrangian function. Since strong duality holds, x* is a minimizer of C(x, a*), where a* denote 
the optimal dual solution. Since £(x,a*) is a strictly convex function of x, x* is the unique minimizer of 

C(x,a*). Let 

x*(a) = argmin£(x, a) = sign(y) max{|y| — (A + a),0}. (A. 13) 

It is clear that x* — x*(a*). In the rest of this proof, we show how to efficiently compute a* . 

Note that x*(0) — sign(?/)0max{|?/| — A,0} is the unique unconstrained minimizer of P(x). When ||x*(0)||i < 
77, then trivially x* = x*(0). However, when ||x*(0)||i > n, Lemma A. 3 implies that x* £ d{x £ R n : \\x\\i < 
n}, i.e. ||x*||i = rj. Therefore, 

a £{a>0: ||x*(a)||i = rj}. (A.14) 
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From (A. 13 1, it follows that 



\\x*(a)\\ 



J2 (Gw|-A)-a)=J>*(0)|-ar 



(A.15) 



i-\Vi\-^> a 



Note that ||x*(a)||i is a strictly decreasing continuous function of a. Since ||x*(0)||i > 77, there exists a 
unique a > such that |ja;*(d)||i = r\. From (A. 14), we can conclude that a* — a. 

To compute a such that ||x*(d)||i — r/, sort z — |x*(0)| in decreasing order. Let zui denote the i-th largest 
component of z. It is clear that ||a;*(«Jr n ])||i > 77 > and ||a;*(a)||i = for all a > wm. Hence, there exists 
an index 1 < k < n such that ||cc* (tf[fc])|| 1 < T) and ||x*(iu[fe+i])||i > r), and it follows that 



1 



E- 



>m - ? ' 



(A.16) 



Thus, x* = x*(a*) can be computed in 0(n\og(n)) operations. Singer et al [T3] show that a* with a 0(n) 
randomized complexity using a slightly modified version of the randomized median finding algorithm. □ 
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